Critical Computational Geographies (Part I): Mapping and Spatial Analysis
Nathan Alexander, PhD
Center for Applied Data Science and Analytics
Howard University
This workshop series will support new and experienced R users with the basics of mapping and spatial analysis. We will use U.S. Census data and the tidycensus() package.
Why is spatial data a useful form of information for critical analysis?
Data
For mapping tasks, we use census data. The core functions for this data come from the Decennial census (get_decennial()) and the American Community Survey (ACS) (get_acs()).
Today, we will use the get_acs() function for ACS data. This is an annual survey that covers topics not available in the decennial US Census data, such as income, education, etc. Estimates are available for both 1- and 5-year periods. The default for the getacs() function is the 5-year ACS estimates.
The data are delivered as estimates (est) characterized by margins of error (moe).
tidycensus()
The tidycensus() package created by Kyle Walker “automatically downloads and merges Census geometries to data for mapping [and] includes a variety of analytic tools to support common Census workflows.”
The states and counties can be requested by name, so there is no need to look up FIPS codes1.
Mapping tools
There are multiple packages that can be used for cartography. Some of the popular packages that can be used are ggplot2, tmap, and mapsf. Walker, however, built a new package called mapgl.
mapgl
mapgl is a new R package written by Walker for high-performance interactive mapping.
Start by installing/loading packages (as needed) and libraries:
# A tibble: 206 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 11001000101 Census Tract 1.01; District of Columbia;… B19013_… 135708 43153
2 11001000102 Census Tract 1.02; District of Columbia;… B19013_… 159583 70430
3 11001000201 Census Tract 2.01; District of Columbia;… B19013_… NA NA
4 11001000202 Census Tract 2.02; District of Columbia;… B19013_… 152059 60482
5 11001000300 Census Tract 3; District of Columbia; Di… B19013_… 174470 21374
6 11001000400 Census Tract 4; District of Columbia; Di… B19013_… 188929 71412
7 11001000501 Census Tract 5.01; District of Columbia;… B19013_… 109116 18172
8 11001000502 Census Tract 5.02; District of Columbia;… B19013_… 157344 32376
9 11001000600 Census Tract 6; District of Columbia; Di… B19013_… 183421 33830
10 11001000702 Census Tract 7.02; District of Columbia;… B19013_… 87750 16226
# ℹ 196 more rows
# A tibble: 100 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 37001 Alamance County, North Carolina B19013_001 64445 2226
2 37003 Alexander County, North Carolina B19013_001 65268 5441
3 37005 Alleghany County, North Carolina B19013_001 44272 3954
4 37007 Anson County, North Carolina B19013_001 44245 3895
5 37009 Ashe County, North Carolina B19013_001 50827 3707
6 37011 Avery County, North Carolina B19013_001 57657 4054
7 37013 Beaufort County, North Carolina B19013_001 57997 3520
8 37015 Bertie County, North Carolina B19013_001 45931 6859
9 37017 Bladen County, North Carolina B19013_001 44528 4772
10 37019 Brunswick County, North Carolina B19013_001 74034 2420
# ℹ 90 more rows
Spatial data
We’ll now add the spatial data to our income data using the simple features sf conditions via geometry = T. Take note that we’ll use the same code above but simply add a new last line of code.
We’ll use geometry=T and update the name of our df.
Simple feature collection with 206 features and 5 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -77.11976 ymin: 38.79165 xmax: -76.9094 ymax: 38.99511
Geodetic CRS: NAD83
First 10 features:
GEOID NAME
1 11001000201 Census Tract 2.01; District of Columbia; District of Columbia
2 11001010300 Census Tract 103; District of Columbia; District of Columbia
3 11001002801 Census Tract 28.01; District of Columbia; District of Columbia
4 11001004002 Census Tract 40.02; District of Columbia; District of Columbia
5 11001006700 Census Tract 67; District of Columbia; District of Columbia
6 11001007707 Census Tract 77.07; District of Columbia; District of Columbia
7 11001008803 Census Tract 88.03; District of Columbia; District of Columbia
8 11001009302 Census Tract 93.02; District of Columbia; District of Columbia
9 11001009509 Census Tract 95.09; District of Columbia; District of Columbia
10 11001002802 Census Tract 28.02; District of Columbia; District of Columbia
variable estimate moe geometry
1 B19013_001 NA NA POLYGON ((-77.07902 38.9126...
2 B19013_001 112778 40979 POLYGON ((-77.03636 38.9748...
3 B19013_001 86029 16177 POLYGON ((-77.03645 38.9349...
4 B19013_001 167102 45557 POLYGON ((-77.04627 38.9166...
5 B19013_001 166250 29702 POLYGON ((-76.99496 38.8898...
6 B19013_001 59512 32728 POLYGON ((-76.94486 38.8790...
7 B19013_001 91317 14597 POLYGON ((-77.00173 38.9099...
8 B19013_001 106364 29915 POLYGON ((-76.99494 38.9239...
9 B19013_001 116613 28202 POLYGON ((-77.00201 38.9510...
10 B19013_001 82675 14933 POLYGON ((-77.03671 38.9271...
NC spatial data
Gathering spatial data for NC at the county level.
We can use our spatial data frames to generate a plot using the plot() function.
DC
We’ll use the DC income data to plot. Recall that we neeed to use our sf df.
plot(dc_income_sf)
Note here that we get multiple maps, which is not likely what you want. Each of the maps contains information for the variables in our data set. We’ll need to specify that we want the estimates.
plot(dc_income_sf['estimate'])
NC
We’ll do the same for NC.
plot(nc_income_sf['estimate'])
At the state level, it is very clear what the purpose of our analysis would be. We’ll likely want to follow a specific research question from this point.
I will modify the code to look at a specific county in NC, Mecklenburg County.
Take note of the new rows added: - geography = tract - state = NC - county = “Mecklenburg”